Skip to content

MON-4477: chore: add permissions on endpointslice to Prometheus Role and use serviceDiscoveryRole: EndpointSlice in ServiceMonitors#1305

Open
machine424 wants to merge 1 commit intoopenshift:mainfrom
machine424:ttr
Open

MON-4477: chore: add permissions on endpointslice to Prometheus Role and use serviceDiscoveryRole: EndpointSlice in ServiceMonitors#1305
machine424 wants to merge 1 commit intoopenshift:mainfrom
machine424:ttr

Conversation

@machine424
Copy link
Copy Markdown

@machine424 machine424 commented Jan 23, 2026

This PR migrates Prometheus service discovery from the deprecated Endpoints API to the EndpointSlices API, by:

  • Setting serviceDiscoveryRole: EndpointSlice on ServiceMonitors.
  • Granting Prometheus endpointslices permissions.

We're taking a conservative approach by keeping the existing endpoints permissions alongside the new endpointslices ones. This provides a safety net in case any ServiceMonitors, whether deployed from this repo or from another source, still rely on the same Role and were missed during the migration.

That said, since both resources provide essentially the same data, keeping both isn't meaningfully more permissive from a security standpoint.

These changes target OpenShift 4.22+ and should not be backported to earlier releases.

…rviceDiscoveryRole: EndpointSlice in ServiceMonitors
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Jan 23, 2026

Walkthrough

Updates Prometheus RBAC configuration and service discovery for the cluster-version-operator to support EndpointSlice resources. Adds RBAC permissions for endpointslices in the discovery.k8s.io API group and configures the ServiceMonitor to use EndpointSlice for service discovery.

Changes

Cohort / File(s) Summary
Cluster-version-operator EndpointSlice configuration
install/0000_90_cluster-version-operator_00_prometheusrole.yaml, install/0000_90_cluster-version-operator_02_servicemonitor.yaml
Adds RBAC rule for endpointslices resource access in discovery.k8s.io API group with get, list, watch verbs. Updates ServiceMonitor to configure Prometheus service discovery to use EndpointSlice.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes


Comment @coderabbitai help to get the list of available commands and usage tips.

@machine424
Copy link
Copy Markdown
Author

machine424 commented Jan 23, 2026

/retitle MON-4477: chore: add permissions on endpointslice to Prometheus Role and use serviceDiscoveryRole: EndpointSlice in ServiceMonitors

@openshift-ci openshift-ci Bot changed the title chore: add permissions on endpointslice to Prometheus Role and use serviceDiscoveryRole: EndpointSlice in ServiceMonitors MON-4477: chore: add permissions on endpointslice to Prometheus Role and use serviceDiscoveryRole: EndpointSlice in ServiceMonitors Jan 23, 2026
@openshift-ci-robot openshift-ci-robot added the jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. label Jan 23, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

openshift-ci-robot commented Jan 23, 2026

@machine424: This pull request references MON-4477 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.22.0" version, but no target version was set.

Details

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Copy Markdown
Contributor

openshift-ci-robot commented Feb 9, 2026

@machine424: This pull request references MON-4477 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.22.0" version, but no target version was set.

Details

In response to this:

This PR migrates Prometheus service discovery from the deprecated Endpoints API to the EndpointSlices API, by:

  • Setting serviceDiscoveryRole: EndpointSlice on ServiceMonitors.
  • Granting Prometheus endpointslices permissions.

We're taking a conservative approach by keeping the existing endpoints permissions alongside the new endpointslices ones. This provides a safety net in case any ServiceMonitors, whether deployed from this repo or from another source, still rely on the same Role and were missed during the migration.

That said, since both resources provide essentially the same data, keeping both isn't meaningfully more permissive from a security standpoint.

These changes target OpenShift 4.22+ and should not be backported to earlier releases.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@machine424
Copy link
Copy Markdown
Author

/retest-required

1 similar comment
@machine424
Copy link
Copy Markdown
Author

/retest-required

@machine424
Copy link
Copy Markdown
Author

/verified by existing tests
/jira refresh

@openshift-ci-robot openshift-ci-robot added the verified Signifies that the PR passed pre-merge verification criteria label Mar 9, 2026
@openshift-ci-robot
Copy link
Copy Markdown
Contributor

@machine424: This PR has been marked as verified by existing tests.

Details

In response to this:

/verified by existing tests
/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@openshift-ci-robot
Copy link
Copy Markdown
Contributor

openshift-ci-robot commented Mar 9, 2026

@machine424: This pull request references MON-4477 which is a valid jira issue.

Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the task to target the "4.22.0" version, but no target version was set.

Details

In response to this:

/verified by existing tests
/jira refresh

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository.

@Tai-RedHat
Copy link
Copy Markdown

/retest

@simonpasquier
Copy link
Copy Markdown

/lgtm

@openshift-ci openshift-ci Bot added the lgtm Indicates that a PR is ready to be merged. label Mar 30, 2026
@machine424
Copy link
Copy Markdown
Author

/retest

Copy link
Copy Markdown
Member

@wking wking left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Poking around to understand the context here, prometheus-operator/prometheus-operator#6672 -> prometheus-operator/prometheus-operator#3862 -> prometheus/prometheus#6838 -> Kube docs explains the benefits of EndpointSlices for Services backed by many endpoints. That's not this CVO Service though, we just have the one backing endpoint, e.g. in this 4.22.0-rc.1 CI run:

$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-main-nightly-4.22-e2e-aws-ovn-serial-1of2/2047770526054617088/artifacts/e2e-aws-ovn-serial/gather-extra/artifacts/endpoints.json | jq -r '.items[] | select(.metadata.namespace == "openshift-cluster-version") | {name: .metadata.name, subsets}'
{
  "name": "cluster-version-operator",
  "subsets": [
    {
      "addresses": [
        {
          "ip": "10.0.95.185",
          "nodeName": "ip-10-0-95-185.ec2.internal",
          "targetRef": {
            "kind": "Pod",
            "name": "cluster-version-operator-d747d47c9-zz95q",
            "namespace": "openshift-cluster-version",
            "uid": "0e69c72e-2c44-45bc-b46a-a8008d46271d"
          }
        }
      ],
      "ports": [
        {
          "name": "metrics",
          "port": 9099,
          "protocol": "TCP"
        }
      ]
    }
  ]
}
$ curl -s https://gcsweb-ci.apps.ci.l2s4.p1.openshiftapps.com/gcs/test-platform-results/logs/periodic-ci-openshift-release-main-nightly-4.22-e2e-aws-ovn-serial-1of2/2047770526054617088/artifacts/e2e-aws-ovn-serial/gather-extra/artifacts/endpointslices.json | jq -r '.items[] | select(.metadata.namespace == "openshift-cluster-version") | {name: .metadata.name, endpoints}'
{
  "name": "cluster-version-operator-l52jj",
  "endpoints": [
    {
      "addresses": [
        "10.0.95.185"
      ],
      "conditions": {
        "ready": true,
        "serving": true,
        "terminating": false
      },
      "nodeName": "ip-10-0-95-185.ec2.internal",
      "targetRef": {
        "kind": "Pod",
        "name": "cluster-version-operator-d747d47c9-zz95q",
        "namespace": "openshift-cluster-version",
        "uid": "0e69c72e-2c44-45bc-b46a-a8008d46271d"
      },
      "zone": "us-east-1c"
    }
  ]
}

However, MON-4477 points out:

Endpoints API is deprecated https://kubernetes.io/blog/2025/04/24/endpoints-deprecation/

And that's a great reason to move off the deprecated-in-Kubernetes-1.33 API, which this pull delivers. OCP 4.22 is based on Kubernetes 1.35, so I'm unclear on why we haven't been getting APIRemovedInNextReleaseInUse alerting since OCP 4.20. Possibly that's because there is no clear plan to remove Endpoints. I opened PI-1510 back in 2022 asking after alert coverage for deprecated APIs, but that ticket's been pretty quiet.

But context aside, looks good to me, thanks! Coverage is well-excercised in pre-merge CI (e.g. TargetDown and ClusterVersionOperatorDown alerting would fail us if we broke CVO monitoring), so no risk of destabilizing other CI or creating QE load:

/lgtm
/label acknowledge-critical-fixes-only

@openshift-ci openshift-ci Bot added the acknowledge-critical-fixes-only Indicates if the issuer of the label is OK with the policy. label Apr 29, 2026
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 29, 2026

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: machine424, simonpasquier, wking

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Apr 29, 2026
@openshift-merge-bot
Copy link
Copy Markdown
Contributor

/retest-required

Remaining retests: 0 against base HEAD 39733be and 2 for PR HEAD 9435bf1 in total

@simonpasquier
Copy link
Copy Markdown

OCP 4.22 is based on Kubernetes 1.35, so I'm unclear on why we haven't been getting APIRemovedInNextReleaseInUse alerting since OCP 4.20. Possibly that's because there is no clear plan to remove Endpoints.

Correct, the Endpoints API is so ingrained in Kubernetes that it's likely never going to be removed.

@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci Bot commented Apr 29, 2026

@machine424: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
ci/prow/e2e-agnostic-ovn-techpreview-serial-3of3 9435bf1 link unknown /test e2e-agnostic-ovn-techpreview-serial-3of3
ci/prow/e2e-hypershift-conformance 9435bf1 link unknown /test e2e-hypershift-conformance
ci/prow/e2e-agnostic-ovn-upgrade-out-of-change 9435bf1 link unknown /test e2e-agnostic-ovn-upgrade-out-of-change
ci/prow/e2e-aws-ovn-techpreview 9435bf1 link unknown /test e2e-aws-ovn-techpreview
ci/prow/e2e-agnostic-ovn-upgrade-into-change 9435bf1 link unknown /test e2e-agnostic-ovn-upgrade-into-change
ci/prow/e2e-hypershift 9435bf1 link unknown /test e2e-hypershift

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

acknowledge-critical-fixes-only Indicates if the issuer of the label is OK with the policy. approved Indicates a PR has been approved by an approver from all required OWNERS files. jira/valid-reference Indicates that this PR references a valid Jira ticket of any type. lgtm Indicates that a PR is ready to be merged. verified Signifies that the PR passed pre-merge verification criteria

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants